Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also search-space dependent; each predictor is designed to make predictions for a specific architecture search space with predefined topologies and set of operations. In this paper, we propose a novel All-in-One Predictor (AIO-P), which aims to pretrain neural predictors on architecture examples from multiple, separate computer vision (CV) task domains and multiple architecture spaces, and then transfer to unseen downstream CV tasks or neural architectures. We describe our proposed techniques for general graph representation, efficient predictor pretraining and knowledge infusion techniques, as well as methods to transfer to downstream tasks/spaces. Extensive experimental results show that AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively, on a breadth of target downstream CV tasks with or without fine-tuning, outperforming a number of baselines. Moreover, AIO-P can directly transfer to new architectures not seen during training, accurately rank them and serve as an effective performance estimator when paired with an algorithm designed to preserve performance while reducing FLOPs.
translated by 谷歌翻译
Predicting neural architecture performance is a challenging task and is crucial to neural architecture design and search. Existing approaches either rely on neural performance predictors which are limited to modeling architectures in a predefined design space involving specific sets of operators and connection rules, and cannot generalize to unseen architectures, or resort to zero-cost proxies which are not always accurate. In this paper, we propose GENNAPE, a Generalized Neural Architecture Performance Estimator, which is pretrained on open neural architecture benchmarks, and aims to generalize to completely unseen architectures through combined innovations in network representation, contrastive pretraining, and fuzzy clustering-based predictor ensemble. Specifically, GENNAPE represents a given neural network as a Computation Graph (CG) of atomic operations which can model an arbitrary architecture. It first learns a graph encoder via Contrastive Learning to encourage network separation by topological features, and then trains multiple predictor heads, which are soft-aggregated according to the fuzzy membership of a neural network. Experiments show that GENNAPE pretrained on NAS-Bench-101 can achieve superior transferability to 5 different public neural network benchmarks, including NAS-Bench-201, NAS-Bench-301, MobileNet and ResNet families under no or minimum fine-tuning. We further introduce 3 challenging newly labelled neural network benchmarks: HiAML, Inception and Two-Path, which can concentrate in narrow accuracy ranges. Extensive experiments show that GENNAPE can correctly discern high-performance architectures in these families. Finally, when paired with a search algorithm, GENNAPE can find architectures that improve accuracy while reducing FLOPs on three families.
translated by 谷歌翻译
波动率聚类是财务时间序列中的常见现象。通常,线性模型用于描述回报的(对数)差异的时间自相关。考虑到该模型的估计难度,我们构建了一个动态的贝叶斯网络,该网络利用了正常γ和γ-伽马的共轭相关关系,因此在每个节点下,其后验形式在本地保持不变。这使得可以使用变分方法快速找到近似解决方案。此外,我们确保模型表达的波动率是在相邻时间步长之间插入虚拟γ节点后的独立增量过程。我们发现,该模型具有两个优点:1)可以证明,与流行的线性模型相比,它比高斯(即具有积极的过量峰度)可以表达较重的尾巴。 2)如果使用变异推理(VI)进行状态估计,则其运行速度比Monte Carlo(MC)方法快得多,因为后验的计算仅使用基本的算术操作。而且,其收敛过程是确定性的。我们使用最近的Crypto,NASDAQ和Forex的不同决议记录测试了名为Gam-Chain的模型。结果表明:1)在使用MC的同一情况下,该模型可以通过常规对数正态链实现可比较的状态估计结果。 2)仅使用VI,该模型可以获得比MC稍差的准确性,但在实践中仍然可以接受; 3)仅在最保守的设置下使用VI,可以根据MC通过对数正态链减少到20%以下的游戏链的运行时间。
translated by 谷歌翻译
最近,我们提供了Wenet,这是一种面向生产的端到端语音识别工具包,它引入了统一的两通道(U2)框架和内置运行时,以解决单个中的流和非流传输模式。模型。为了进一步提高ASR性能并促进各种生产要求,在本文中,我们提出了Wenet 2.0,并提供四个重要的更新。 (1)我们提出了U2 ++,这是一个带有双向注意解码器的统一的两次通行框架,其中包括通过左右注意力解码器的未来上下文信息,以提高共享编码器的代表性和在夺回阶段的表现。 (2)我们将基于N-Gram的语言模型和基于WFST的解码器引入WENET 2.0,从而促进了在生产方案中使用丰富的文本数据。 (3)我们设计了一个统一的上下文偏见框架,该框架利用特定于用户的上下文(例如联系人列表)为生产提供快速适应能力,并提高了使用LM和没有LM场景的ASR准确性。 (4)我们设计了一个统一的IO,以支持大规模数据进行有效的模型培训。总而言之,全新的WENET 2.0可在各种Corpora上的原始WENET上取得高达10 \%的相对识别性能提高,并提供了一些重要的以生产为导向的功能。
translated by 谷歌翻译
跨域建议可以帮助缓解传统的连续推荐系统中的数据稀疏问题。在本文中,我们提出了Recguru算法框架,以在顺序推荐中生成包含跨域的用户信息的广义用户表示,即使在两个域中的最小或没有公共用户时也是如此。我们提出了一种自我细心的AutoEncoder来导出潜在用户表示,以及域鉴别器,其旨在预测所产生的潜在表示的原点域。我们提出了一种新的逆势学习方法来训练两个模块,以使从不同域生成的用户嵌入到每个用户的单个全局Gur。学习的Gur捕获了用户的整体偏好和特征,因此可以用于增强行为数据并改进在涉及用户的任何单个域中的推荐。在两个公共交叉域推荐数据集以及从现实世界应用程序收集的大型数据集进行了广泛的实验。结果表明,Recguru提高了性能,优于各种最先进的顺序推荐和跨域推荐方法。收集的数据将被释放以促进未来的研究。
translated by 谷歌翻译
尽管概念化已经在语义和知识表示中进行了广泛研究,但找到最准确的概念短语来表征在快速增长的社交媒体上表征文本片段的主要思想仍然具有挑战性。这部分归因于以下事实:大多数知识库都包含世界的一般术语,例如树木和汽车,它们没有定义的力量或对社交媒体应用程序用户不够有趣。另一个原因是,自然语言的复杂性允许使用时态,否定和语法改变语言的逻辑或重点,从而传达了完全不同的含义。在本文中,我们提出了标签,这是一个高质量的概念匹配的数据集,该数据集由10,000个标记的精细概念和网络风格的自然语言句子组成,并从开放域社交媒体中挖出。我们考虑的概念代表了在线用户的趋势兴趣。与标签相关的是这些细粒度概念和实体的概念图,以提供结构上下文信息。我们在标签上评估了广泛的流行神经文本匹配模型以及预先训练的语言模型,并指出他们以最合适的概念标记社交媒体内容的不足。我们进一步提出了一种新颖的图形匹配方法,该方法通过更好地利用概念图中的结构上下文和句子中语义单元之间的逻辑相互作用在句子中通过句法依赖性解析来展示出色的抽象和概括性能。我们开源标签数据集和提出进一步研究的建议方法。
translated by 谷歌翻译
Pennylane是用于量子计算机可区分编程的Python 3软件框架。该库为近期量子计算设备提供了统一的体系结构,支持量子和连续变化的范例。 Pennylane的核心特征是能够以与经典技术(例如反向传播)兼容的方式来计算变异量子电路的梯度。因此,Pennylane扩展了在优化和机器学习中常见的自动分化算法,以包括量子和混合计算。插件系统使该框架与任何基于门的量子模拟器或硬件兼容。我们为硬件提供商提供插件,包括Xanadu Cloud,Amazon Braket和IBM Quantum,允许Pennylane优化在公开访问的量子设备上运行。在古典方面,Pennylane与加速的机器学习库(例如Tensorflow,Pytorch,Jax和Autograd)接口。 Pennylane可用于优化变分的量子本素体,量子近似优化,量子机学习模型和许多其他应用。
translated by 谷歌翻译
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.
translated by 谷歌翻译
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.
translated by 谷歌翻译
Optical flow, which computes the apparent motion from a pair of video frames, is a critical tool for scene motion estimation. Correlation volume is the central component of optical flow computational neural models. It estimates the pairwise matching costs between cross-frame features, and is then used to decode optical flow. However, traditional correlation volume is frequently noisy, outlier-prone, and sensitive to motion blur. We observe that, although the recent RAFT algorithm also adopts the traditional correlation volume, its additional context encoder provides semantically representative features to the flow decoder, implicitly compensating for the deficiency of the correlation volume. However, the benefits of this context encoder has been barely discussed or exploited. In this paper, we first investigate the functionality of RAFT's context encoder, then propose a new Context Guided Correlation Volume (CGCV) via gating and lifting schemes. CGCV can be universally integrated with RAFT-based flow computation methods for enhanced performance, especially effective in the presence of motion blur, de-focus blur and atmospheric effects. By incorporating the proposed CGCV with previous Global Motion Aggregation (GMA) method, at a minor cost of 0.5% extra parameters, the rank of GMA is lifted by 23 places on KITTI 2015 Leader Board, and 3 places on Sintel Leader Board. Moreover, at a similar model size, our correlation volume achieves competitive or superior performance to state of the art peer supervised models that employ Transformers or Graph Reasoning, as verified by extensive experiments.
translated by 谷歌翻译